Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning

نویسندگان

  • Erick Galani Maziero
  • Thiago Alexandre Salgueiro Pardo
چکیده

Multi-document handling is essential today, when many documents on the same topic are produced, especially considering the Web. Both readers and computer applications can benefit from a discourse analysis of this multidocument content, since it demonstrates clearly the relations among portions of these documents. This work aims to identify such relations automatically using machine learning techniques. Particularly, this work focuses on the identification of relations predicted by the Cross-document Structure Theory (CST). The obtained results improve the state of the art.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning Approaches to Shallow Discourse Parsing: A Literature Review

This document reviews the literature on shallow discourse parsing, in particular the use of machine learning techniques. This is deliverable Y1.M6 of the Discourse Parsing White Paper which is part of the MDM IP of the IM2 project.

متن کامل

Discourse Parsing with Attention-based Hierarchical Neural Networks

RST-style document-level discourse parsing remains a difficult task and efficient deep learning models on this task have rarely been presented. In this paper, we propose an attention-based hierarchical neural network model for discourse parsing. We also incorporate tensor-based transformation function to model complicated feature interactions. Experimental results show that our approach obtains...

متن کامل

Unsupervised Learning for Natural Language Processing

Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an ...

متن کامل

Hybrid Approach to PDTB-styled Discourse Parsing for CoNLL-2015

This paper describes our end-to-end PDTB-styled discourse parser for the CoNLL-2015 shared task. We employed a machine learning-based approach to identify discourse relation between text spans for both explicit and implicit relations and employed a rule-based approach to extract arguments of the discourse relations. In particular, we focus on improving the implicit discourse relation identifica...

متن کامل

CSTParser – a multi-document discourse parser

This paper presents the CSTParser, a multi-document discourse parser. Based on machine learning techniques and hand-crafted rules, the system identifies a set of relations predicted by CST (Cross-document Structure Theory) among sentences of different texts on the same topic.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011